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Abstract. Wo show that if a big set of integer points S C [0, A'']'', d > 1, 
occupies few residue classes mod p for many primes p, then it must essentially 
lie in the solution set of some polynomial equation of low degree. This answers 
a question of Helfgott and Venkatesh. 



1. Introduction 

One of the main topics of study in analytic number theory is the distribution 
of sets of integers in residue classes. Examples abound, but folkloric ones include 
Dirichlet's theorem, which tells us that the primes are uniformly distributed along 
primitive residue classes, and the open problem of determining how large may the 
least quadratic non-residue be. 

On the other hand, in the expanding subject of arithmetic combinatorics, much 
of the focus has been in establishing what is known as inverse theorems in which 
one starts with a set having a specific arithmetic property and wishes to use this 
information to give a characterization of the set. Notable examples of this include 
Freiman type theorems (see for instance [BJ [TOl HZ] and the survey [Ij), inverse 
theorems for the Gowers norm f7| and the inverse Littlewood-Offord theory [I81IT9] . 

This paper is concerned with the problem of connecting both lines of inquiry by 
establishing an inverse theorem for the distribution of sets in residue classes. Since 
we would expect a random set to be fairly well distributed, the main question here 
is whether a set occupying very few residue classes for many primes p has to have 
some specific structure. The remarkable observation that this might indeed be the 
case is due to Croot and Elsholtz [3] and Helfgott and Venkatesh [TT| . Writing [N] 
for the set of integers {0, . . . , N} their observation can be resumed in the following 
principle: 

Inverse Sieve Problem. Suppose a set S C [N]"^ occupies very few residue classes 
mod p for many primes p. Then, either S is small, or it possesses some strong 
jebraic structure. 



There is a good reason why such inverse sieve results are of much interest in 
number theory. One of the main features of sieve theory is the uniformity of its 
results, which is a consequence of the fact that sieves only take into account the 
cardinality of the classes occupied by the set. However, a clear drawback of this 
is that the bounds thus obtained are limited to what happens in extremal cases. 
By stating that such extremal sets must have a very specific structure, inverse 
results should allow one to retain the uniformity of the sieve while providing much 
stronger bounds. The reader may consult the book of Kowalski ^13, §2.5] for further 
discussion of the potential applications of this phenomenon and [J] for applications 
of similar classifications in arithmetic combinatorics. 



The author was partially supported by a CONICET doctoral fellowship. 
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In this paper we give a satisfactory answer to the inverse sieve problem for 
every d > 2. In order to discuss our results suppose we are given a big integer 
set S C [N]'^ occupying 0{p'^~^) residue classes in (Z/pZ)'* for many primes p. 
What does this imply about 5? By the Lang- Weil inequality, we know that this 
condition is satisfied by the set of integer points of a proper algebraic variety of 
small degree and one would expect a partial converse to also hold. That is, that 
any big set S C [N]'^ occupying only that many residue classes for every prime p 
should essentially be contained inside the solution set of a polynomial of low degree. 
When d = 1 this follows from Gallagher's larger sieve [8] (not to be confused with 
the conjecture discussed in iJ6.3l) . The case d = 2 was proven by Helfgott and 
Venkatesh in [TT] , by applying the Bombieri-Pila determinant method I to obtain 
a two-dimensional generalization of the larger sieve. Although their methods are 
only capable of handling the case d < 2, they conjectured that such an inverse 
theorem should in fact hold for every dimension d. In this paper we introduce a 
different approach and use it to answer this question by giving the following best 
possible result. 

Theorem 1.1. Let < k < d be integers and let e, a, 77 > 6e positive real numbers. 
Then, there exists a constant C depending only on the above parameters, such that 
for any set S C [NY occupying less than ap^ residue classes for every prime p at 
least one of the following holds: 
(i) (S IS small) \S\ <^d,k,e,» ivfc-i+-^ 

(ii) (S is strongly algebraic) There exists a polynomial f G Z[xi, . . . ,Xd] of degree 
at most C and coefficients bounded by N^' vanishing at more than (1 — ri)\S\ 
points of S . 

Theorem 11.11 is sharp. Indeed, the reader may consult Section Sj5] for examples 
of sets of size \S\ N'^~^ occupying less than p'' residue classes for every prime p 
but possessing no algebraic structure. On the other hand, we only need to require 
from S that it occupies few residue classes for sufficiently many small primes (see 
Theorem 12. 4p . More generally, we will show in Theorem 16.11 that assuming some 
necessary regularity conditions, every set of size ^ occupying few residue classes 
for many primes p must satisfy condition (ii). In Section t i6.2l we shall give an easy 
application of this generalization to the characterization of functions preserving 
some structure when reduced to prime moduli. 

Taking d = 2 in Theorem 1 1.1 1 we recover the result of [TT]. Actually, the methods 
of Helfgott and Venkatesh are capable of handling the case k — 1 of Theorem 11.11 
that is, when S is assumed to occupy only 0{p) residue classes. However, the 
approach fails as soon as the set occupies more than plogp classes. The reason for 
this is that their method, as well as the larger sieve itself, is in essence a counting 
argument (see WS.l^ and therefore needs the number of classes occupied by S to be 
small, while the high dimensional setting requires us to take advantage of the local 
density of S being small. This type of obstacle is not specific to the problem at hand, 
but arises whenever one tries to extend this kind of sieves to higher dimensional 
settings (see [T2] Remark 3] for some discussion). So while we do make use of the 
larger sieve, in order to establish Theorem 11.11 we need to introduce an approach 
that overcomes this difficulty by taking advantage of the structure of the set and 
which we believe to be applicable in more general situations. 

The rest of the paper is organized as follows. After setting up some notation, 
in Section ij2]we state and discuss Proposition [221 which is the main ingredient of 
the paper, and use it to deduce Theorem 11.11 This proposition says that while a 
polynomial identity of degree r usually requires 0{r'^) points to be tested, if one is 
interested in sets satisfying hypothesis similar to those of Theorem 1 1 . 1 1 then one can 
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verify such identities for a positive proportion of the set using only 0{r^) points. 
Then, in Section [J31 we review some facts about the larger sieve and apply them 
to obtain a key uniformization lemma. Using this, the proof of Proposition 12.21 is 
carried out in Section fj4l Finally, in fj5]we construct several examples showing that 
our results are sharp, while in fJS] we discuss further consequences of our methods 
as well as the remaining case {d ~ 1) of the inverse sieve problem. 

Acknowledgment. The author would like to thank his advisor Roman Sasyk for 
several comments and suggestions during the preparation of the paper. 

2. A CONDITIONAL PROOF OF THEOREM 1.1 

2.1. General notation. We now fix some notation. By Oci....,Cfc (-'i^) we shall mean 
a quantity which is bounded by Cci,....Cfc^ where Cci,....Cfe is some finite positive 
constant depending on ci,...,Cfe. Also, we shall write Y <^ci....,ck ^ to mean 
\Y\ = Oci,....cfc (A"). However, since we will generally be concerned with the study 
of a set S satisfying the hypothesis of Proposition 12.21 for some parameters d, /i, k 
and £ as in the statement of that proposition, we will free up some notation by 
assuming that all implied constants in the O, ^ notation always depend on these 
parameters even though this may not be explicitly stated. So for instance Y X 
stands for Y ^,,,ci,h,K,s A. Throughout the paper we will let the letter c denote a 
small positive constant whose exact value may vary at each occurrence. 

Given a statement (l){x) with respect to an element x e [NY we will write 
for the function which equals 1 if 4>{x) is true and otherwise. Also, we shall write 
TTi : Z"^' — > Z, 1 < i < d, for the projection to the ith coordinate. 

The letter p will always refer to a prime number. We write V for the set of 
primes and given any magnitude Q, we denote V{Q) the set of primes p < Q. Since 
we will usually need to consider the weight over V, for a finite subset P C V 

we write w{P) := J2peP We shall use the estimates w{V{Q)) =\ogQ + 0(1) 
and X^pePCQ) ^'^^P ^ Q without explicit mention. 

2.2. Characteristic sets. The purpose of this section is to state Proposition 12.21 
which is the key ingredient of the paper and use it to derive Theorem ll.il What this 
Proposition essentially says, is that for any ill-distributed set S as in the statement 
of Theorem ll.il one may find a very small "characteristic" subset ACS such that 
if a small polynomial vanishes at A then it also vanishes at a positive proportion of 
S. The task of proving Theorem 1 1.1 1 is thus reduced to that of finding a polynomial 
which vanishes at A, and this will always be possible since A is small. 

Before proceeding, we need to define exactly what we mean for a polynomial 
to be small. Given a parameter N and some integer d > by an r-polynomial, 
for a positive integer r, we shall mean any polynomial / with integer coefficients 
satisfying \f{n)\ < N^"^ for every n e [N]"^. The exponent 3r is chosen in order 
to guarantee that if N is sufficiently large in terms of r and d, then a polynomial 
/ e Z[a;i, . . . , Id] of degree at most r, with coefficients bounded in absolute value 
by , is an r-polynomial. This leads us to the following definition. 

Definition 2.1. Let < 5 < 1 be a positive real number and r > some integer. 
We say a subset A of a set S is (r, ^)-characteristic for S if we can find some subset 
A C B C S of size \B\ > S\S\ such that whenever an r-polynomial vanishes at A, 
then it also vanishes at B. 

We can now state Proposition 12.21 which says that ill-distributed sets always 
admit characteristic subsets. 
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Proposition 2.2. Let d,h > 1 be arbitrary integers and £ > some positive real 
number. Set Q = and let P C ViQ) satisfy w{P) > nlogQ for some k > 0. 
Also, let r be an arbitrary positive integer. Suppose S C [N]'^ is a set of size 
\S\ ^ pfd-h-i+e Qf^Qupyijig rnost ap'^~^ residue classes mod p for every prime 
p £ P and some a > 0. Then, if N is sufficiently large, there exists a set A C_ S of 
size \A\ — 0{r'^^^) which is (r, 5) -characteristic for S , for some 6^1 independent 
of S or r. 

Remarks. The exact value of Q in the above statement is irrelevant and may be 
replaced by any small power of N. The reason why we have made the change of 
variables h := d — k with respect to Theorem 1 1.1 1 is that in the arguments to follow 
we shall always set the quantity h to be fixed and induct on d. We believe it is 
simpler to introduce this change of notation at an early stage. 

To see why such a result might be expected consider some polynomial / vanishing 
at an integer point x. Since polynomials descend to congruence classes, this means 
that for any other integer y satisfying y = a: (mod p) for a prime p, we will have 
p\f{y). Thus, if we are given a set S which occupies very few residue classes, one 
may then hope to find a small subset A such that given some y € S there are a lot 
of primes p for which y = a;(mod p) for some x G A. It would then follow that if 
a polynomial vanishes at A then there would be many primes p dividing f{y). If 
furthermore / is small, then this can only hold if f{y) = 0. 

On the other hand, the size hypothesis on S is necessary. For instance, one may 
construct small (logarithmic size) sets S C [N] as in [TTl §4.3] which occupy few 
residue classes for large moduli just because they are small, but which however have 
at most one element in each residue class, making the above argument unviable in 
this situation. Furthermore, it is clear that a similar pathology occurs in higher 
dimensions, by considering for instance the product set S x [A^]. For the general 
construction of this type of sets and to see that in fact one cannot take £ = in 
Proposition 12.21 the reader is referred to iJ5] 

In order to deduce Theorem 11.11 from Proposition 12.21 we will need to find a 
polynomial which vanishes at a specific set of points. This will be accomplished in 
a standard way by means of Siegel's lemma. 

Lemma 2.3 (Siegel). Suppose we are given a system of m linear equations 

n 

^a^jl3j = VI < i < m, 

in n unknowns (/3i, . . . ,/?«), n > m, where the coefficients (ciij) are integers not all 
equal to and bounded in magnitude by some constant C . Then, the above system 
has a non trivial integer solution {fii, . . . , /3„) with < 1 + (Cn)™/^"~™' for all 
1 < j < n. 

Proof of Theorem assuming Proposition \2.'A Let the hypothesis be as in the 
statement of Theorem 11.11 and write h := d — k. Assume condition (i) fails, so 
that liS"! 3> iV''~''~^+'^. We claim that for any given integer r there exists a set 
A C S oi size \A\ — Ojj{r'^~'^) which is (r, 1 — r/)-characteristic for S, provided N 
is sufficiently large. To see this we begin by noticing that Proposition 12.21 implies 
the existence of some 6:^1 such that for every subset S' C S with \S'\ > t]\S\ 
there exists a set A' C S' of size \A'\ = 0,f{r'^~^) which is (r, (5)-characteristic for 
S'. From now on we fix this value of 6. Let Aq be such a characteristic subset for S 
and let Bq consist of those elements of S which vanish at every r-polynomial that 
vanishes at Aq, so in particular l-Bol > i^l'S']. If <5 > 1 — 77 we are done, otherwise we 
have that 5"! := S \ Bq satisfies IS*!! > ri\S\ and therefore contains a characteristic 
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subset Ai C as above. If we now let Bi denote those points of 5*1 vanishing at 
every r-polynomial that vanishes at Ai we see that either we get the claim with 
A = AoUAi or the set ^2 := Si \ Bi satisfies rj\S\ < l^sl < (1 - S)^\S\. After 
iterating this process j times we see that if the set A = IJ^Zq Ai is not (r, 1 — 77)- 
characteristic for S then we can find some Sj C S with ri\S\ < Sj < (1 — '5)-'|'5'|. 
Since this last possibility cannot hold for some large j — Ojj{l), the claim follows. 

Now it only remains to find some r-polynomial / which vanishes at A and which 
is of the form given in Theorem 11.11 We may assume d\r. We thus consider the 

system of |A| linear equations in + l)'' unknowns given by 

^ Aa' = Va e ^, (2.1) 

i={ii,...,ia}<r/d 

where i < I stands for ij < I for all 1 < j < d and where we use the multi-index 
notation a' — a^V . .a]f for a — (ai, . . . ,0^). Notice that |a'| < iV''. If we now 
choose r — Or,{l) large enough so that + l)"* > 3|^| it follows by Siegel's lemma 
that there exists an integer solution to (j2.ip with \/3i\ iV^^ < provided 
N is sufficiently large. We thus see that the polynomial / := J2i<r/d i^i^' '-'^ 
desired form (assuming again that N is sufficiently large) and, taking C = r, this 
concludes the proof of Thcorcm ll.il □ 

Notice that we have actually proved the following slight strengthening of Theo- 
rem [TTT] in which the set S is only required to be badly distributed in a dense subset 
of the primes. 

Theorem 2.4. Let < k < d be integers and let e,r] > be positive real numbers. 
Set Q — N"^ and let P C 'P{Q) satisfy w{P) > KlogQ for some k > 0. Suppose 
S C [NY^ is a set of size \S\ 3> N^^^^'^ occupying at most ap^ residue classes 
mod p for every prime p € P and some a > 0. Then there exists a polynomial 
f £ Z[xi, . . . , Xd] of degree 0,j{l) and coefficients bounded by N^'^^^^ which vanishes 
at more than (1 — 77)|S'| points of S . 

Remark. Since we have already mentioned that the exact value of Q in Proposition 
12.21 is irrelevant, it follows that Theorem 12.41 also holds with Q any small power of 
N. 

3. Applying the larger sieve in high dimensions 

3.1. A review of the larger sieve. We will now quickly review some facts about 
Gallagher's larger sieve and use them to prove two easy lemmas which we shall need 
later. For further discussion of the larger sieve and its consequences the reader may 
consult Section 2.2] and of course Gallagher's original paper [S]. 

Before proceeding we need to state some further notation that will be used in 
this and the next sections. When studying a set 5 C [N]'^ we will denote by [S]p 
the set of residue classes mod p occupied by S. Given such a set S, we shall be 
largely concerned with how many elements of S belong to a given residue class, so it 
is important for us to have a specific notation for this subset. Thus, given a residue 
class a = (ai, . . . , ad)(mod p) we write ^(a;^) to refer to those elements of 5* which 
are congruent to a(mod p). Moreover, we shall sometimes consider some a G Z/pli 
and write S{a;p) for those elements of S having their first coordinate congruent 
to a (mod p). Since we will always use the bold font a to denote a vector residue 
class and since where this class lives shall be clear from the context altogether, we 
believe the similarity of both notations will not cause any confusion. Finally, if p 
is fixed, we may simply write S{a.) and S{a) for the above sets. 
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Fix now a set S and consider some parameter Q. The main idea of the larger 
sieve is to count in two different ways the number of distinct pairs x,y € S and 
primes p < Q such that x = j/(mod p). Given two such integers x,y G [N] it is 
clear that those primes for which they are congruent are exactly those dividing 
1^; — y| < N and therefore 

J2 J2 l.^!,(modp)l0gP< l^plogiV. (3.1) 
p<Q x,yeS 

On the other hand, we have that the left hand side of (|3.ip equals 

J2 E \Sia;p)f\ogp-\S\J2\ogp. (3.2) 

P<Q a(inod p) P<Q 

Notice that the above argument also works if C [N]'^ since if p is a prime for 
which X = ?/(mod p) then p must divide |7ri(a;) — 7ri(y)| which is bounded by N. 

As an example, we have the following result due to Gallagher [5]. Suppose we are 
given a set 5 C [N] occupying at most ap residue classes, then the Cauchy-Schwarz 
inequality implies 

E \Sia;p)\'>—\S\\ 
^-^ ap 

a(mod p) 

Combining this with p.ip and p.2p we obtain 

ilogQ + o(^|||^ <log7V + 0(l). 

Taking Q = |S'| we conclude that jS"] iV". 

For the purposes of this paper, we need to apply Gallagher's sieve in a slightly 
more general context. Precisely, we will use the following lemma. 

Lemma 3.1. Let X C [N] be some set of integers and set Q — N'^ for some 7 > 0. 
Let ci,C2 > be positive real numbers. Suppose there is a set of primes P C V{Q) 
with w{P) > ci log Q such that for every p £ P there are at least C2 |X| elements of 
X lying in at most ap residue classes for some a > independent of p. Then, if a 
is sufficiently small in terms of Ci,C2 and 7, it must be \X\ < Q. 

Proof. Again, we count the number of pairs x,y € X and p £ P with x = y(mod p). 
On one hand, we have as before that 

E E l^EEy(modp)logp< IXplogiV. (3.3) 

x^y 

On the other hand, using the Cauchy-Schwarz inequality we see that our hypothesis 
on X implies 



\X{a;p)\'>—ic2\X\f, 
ap 

a(mod p) 



from where it follows that the left hand side of (13.31) is at least 



2lA\x\HogQ + Omx\). 
a 

It is then clear that if a is sufficiently small, then the only way for (13.31) to hold is 
to have \X\ <Q. □ 

Finally, we prove the following easy consequence of the larger sieve which already 
handles the case d = h oi Proposition [ 
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Lemma 3.2. Let Q = N"' for some 7 > and let P C 'P{Q) be some set of primes 
with w{P) > cilogQ for some ci > 0. Let S C [N]"^ occupy less than C2 residue 
classes mod p for every prime p £ P and some constant ci. Then \S\ = Oci, 02,7(1)- 

Proof. Gallagher's sieve implies in this case 

and clearly, for sufhciently large iV, this can only hold if jS*] < C2. □ 

3.2. Genericity. Our strategy to prove Proposition 12 . 2 1 will be to partition S into 
many lower dimensional subsets and apply induction. However, the main obstacle 
we encounter in doing so (and which is not merely a technical issue, as can be seen 
from the examples in ij5]) is the possibility that the resulting subsets are rather 
independent from each other, in the sense that they do not share many residue 
classes. If this happens, then the fact that a small polynomial vanishes at one of 
this subsets will not give us much information about the behavior of this polynomial 
in the other subsets. However, in order for this to happen it would be necessary for 
these subsets to occupy very few residue classes and this would imply the existence 
of too many elements in each subset occupying the same residue class. While with 
our hypothesis one cannot guarantee that this never happens, the goal of this section 
is to show that this indeed does not happen on average, which will be sufficient for 
our arguments. 

We begin with the following definition. 

Definition 3.3 (Genericity). Given a real number B > and some integer I > 
we say that a set S" C [NY is (-B, Z)-generic mod p if 

l^(a;p)| B 
\S\ P'' 

for every residue class a(mod p). 

Given a set of primes P C V{Q) we shall write P' ^ P to mean a subset P' C P 
with w{P') w{P). Recall that by our conventions in i j2.1l the implied constants 
depend on the parameters d,h,e,K of Proposition 12.21 The rest of this section is 
devoted to the proof of the following lemma. 

Lemma 3.4. Let d,h> 1 be arbitrary integers and let e > be some positive 
real number. Set Q — and let P C V{Q) satisfy w{P) > nlogQ for some 

K > 0. Suppose S C [N]'^ is a set of size \S\ ^ ]\[d-h-i+e QCQ^pyifig at most ap'^^^ 
residue classes mod p for every prime p £ P and some a > 0. Then there exists 
B = 0(1) and a set of primes P' ^ P such that for each p £ P' there is some 
subset Qp{S) C S, \Qp{S)\ ^ which is {B,d — h)-generic mod p. 

Remarks. Here again the exact value of Q is not important as long as it is a small 
power of N . Also, as in the previous statements, all the hypothesis are necessary 
because of the examples in fJS) 

Proof. From now on fix an integer h > 1. li d < h the result is trivial, while for 
d = ft, it follows from Lemma [3.21 with B — \S\ and P' — P. We will proceed by 
induction on d. Thus, let d > /i + 1 be some integer and assume the result holds 
for every smaller dimension. 

Take S and P as in the statement and recall that '7Ti{S) is the projection of S 
to the ith coordinate. We claim that for some 1 < i < d there exists a set S' C S 
with \S'\ > |S'|/2'^ such that every ^ C S" with \A\ > \S'\/2 satisfies \7t,{A)\ > Q. 
Indeed, if the claim fails with S' = S and i = 1 we may find some subset Si £ S 
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with |5i| > \S\/2 and |7ri(5i)| < Q. Then, if the claim fails again with S" = 5*1 and 
i = 2, we get some 52 C Si with jS'al > |S'i|/2 > |S'|/4 and |7ri(S'2)|, |7r2(S'2)| < Q. 
Iterating this d times either we get the claim or end up with a set Sd Q S satisfying 

1^1 < ASd\ < 2''ki(5<i)| . . . \7Td{Sd)\ < 2''Q'-. 

By our choice of Q this is clearly absurd for sufficiently large N and therefore the 
claim follows. 

Since it sufficies to prove the lemma for such a subset S' we may assume without 
lost of generality that S' = S and permuting the coordinates if necessary we may 
also assume i = 1. Hence, we have that 

|7ri(A)| > Q for every AQS with \A\ > \S\/2. (3.4) 

We wish to construct a dense subset of S which is in an adequate position to 
apply the induction hypothesis. Since we will be working with the first coordinate, 
given some a 6 Z/pZ, we will write S{a;p) to refer to those elements of S having 
their first coordinate congruent to a (mod p). Let Bi be some large constant to be 
specified later. Since \[S]p\ < ap'^~'^, it is clear that there can be at most ap/Bi 
residue classes a G [7ri(S')]p C Z/pZ for which |[S'(a;p)]j,| > Bip'^~'^^^. We denote 
by £i{p) this exceptional set. Also, we write 

£2{p) (« e [7Ti{S)]p : \S{a;p)\ > ^\S\] . 
[ ap J 

From the obvious fact that X^aez/pZ \^i^'P)\ ~ \^\ follows that \S2{p)\ < ctp/Bi 
and therefore \£{p)\ < 2ap/Bi, where £{p) := £i{p)D£2{p)- By means of the larger 
sieve we may now deduce that not too many integers in [N] can lie in £{p) for many 
p G P. Indeed, consider the set X which consists of all elements x G [N] for which 

2^ lx(modp)ee(p)— - > 

By the pigeonhole principle, one may then find a set of primes Pi C P with w{Pi) > 
jw{P) and such that I UaGf (p) — every p G Pi. It then follows 

from Lemma [??T] that upon choosing Pi sufiiciently large, we can ensure that |X| < 

g. 

By (1321), we deduce that \S\Tr^^{X)\ > ^\S\. We may therefore find a subset 
S" C 5 with |S"| > j\S\ which does not intersect 7rf^(X) and such that S'^ := 
TTi^{x)nS' satisfies \S!j.\ > ]\fd-h-2+e for every a; G 7ri(S"). Every such a; lies outside 
of X and therefore has associated a set of primes P^ ^ P for which x(mod p) ^ 
£{p). Since £i{p) C £{p), we may apply the induction hypothesis to S'^ for every x 
to see that there exists sets of primes P^ ^ Px and constants c, B2 > independent 
of a;, such that for eachp G P^ there is a (P2, c?— /i— l)-generic modp set Qp{S'^) C S'^ 
containing at least c\S'x\ elements. 

Since the sets P^ constructed above satisfy P^ '-^ P, with the implied constant 
independent of x, we may apply again the pigeonhole principle to locate some set 
of primes P' ^ P and some constant c > 0, such that for each p E P' there are at 
least c\S'\ elements s G S" for which p G P^j(s)- It thus follows that if for a prime 
p G P' we consider the set 

GpiS):^ U GpiSL), 

then \gp{S)\ > \S'\ > |5| and GpiS) D ttY\x) = GpiS'J is a (P2, d - /i - l)-generic 
set for every x G T^iiGpiS)). Also, we see that there are at most ^ ^l^pl-S*)! 
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elements of Gp{S) having the same first coordinate mod p since by construction it 
does not he in £2{p)- It thus follows that Gp{S) is a i?-generic set for some large 
B depending on Bi and B2 but independent of p and this concludes the proof of 
Lemma 13.41 □ 



4. The proof of Proposition 2.2 

In this section we give a proof of Proposition 12.21 As we did in the proof of 
Lemma 13.41 we will fix an integer h and induct on d. Since for d < h the result is 
either trivial or follows from Lemma 13.21 we may assume d > h + 1 and that the 
result holds for all smaller dimensions. 

We are thus given a set S and some positive integer r. Our first step will be to 
find generic sets inside the sections of S for many primes p. Proceeding as in the 
beginning of the proof of Lemma 13.41 we may assume that 

|7ri(A)| > Q for every ACS with \A\ > \S\/2. (4.1) 

This allows us, at the cost of passing to a subset of half density if necessary, to get 
the bound 

|5'x| < 2\S\/Q for every x G [N], (4.2) 

where Sx '■= Tr^^{x) n S. Finally we may also assume, again by passing to a subset 
of half density if necessary, that jS'^^l ^ j\j-d-h-2+e f^j. gyery x G 7ri(S'). 

Let B be some large constant. For every prime p we denote by £{p) the set 
of residue classes a S Z/pZ for which |[5'(a;p)]p| > Bp'^^'^^^ (recall that S{a;p) 
stands for those elements of S having their first coordinate congruent to a (mod p) 
and thus [S{a;p)]p consists of those residue classes in [S]p C (Z/pZ)'^ having a as a 
first coordinate). Since \£{p)\ < ap/B, applying Lemma l3.1l as in the proof Lemma 
13.41 we conclude by (|4.ip that if B is chosen sufficiently large, we may find some 
S' C 5', \S'\ ^ \S\, such that for each x G 7ri(S") we have Px ^ P, with the implied 
constant independent of x, and where 

Px -.^ {p e P : a;(mod p) ^ £{p)} ■ 

This places us in a position in which we can apply the induction hypothesis to each 
section S'x of S' to find some (5o 3> 1 independent of x such that each S'x admits a 
(r, (5o)-characteristic subset of size 0{r'^~'^~^). In particular, we see that at the cost 
of passing to a subset of S' of density Sq if necessary, we may assume that inside 
each S'x we can find a set of size 0{r'^^^^^) which is (r, l)-characteristic for the 
whole section. Notice that since we are refining the sections, we still get a bound of 
the form ^ ]\[d-h-2+e Jqj. gygj-y g 7r]^(S"). Thus, we may also apply Lemma 
13.41 to every such S'^ obtaining sets of primes P^ ^ Px such that for every p £ P,^ 
we can find a {C,d — h — l)-generic subset Gp{Sx) C S'x, |^p(>5'2;)| ^ 15^1, where C 
and the implied constants are independent of p and x. In particular, we may find 
some set of primes P' ^ P such that for each p E P' the set 

Gp{S):= U GpiSx), 

x:peP^ 

satisfies |Sp(S')| \S'\ ^ \S\ and each nonempty section {Gp{S))x of Qp{S) is a 
{C,d — h — l)-generic set. 

From now on we write Qp :— Gp{S). The next lemma is crucial as it allows us 
to find sections of S containing the residue class of many elements of S for many 
primes p. 
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Lemma 4.1. There exists a set B C S' , \B\ 3> 15*1, such that for every non empty 
section Bx of B there is a set of primes ^ P' ^ P with 

{s e S' : [s]p e [Bx]j^}\ - 



> 



P 



for every p € P^, where c > does not depend on x or p. 

Proof. We begin by fixing a prime p ^ P' and considering some residue class a S 
[''■i(5p)]p. Since p is fixed we will simply write Gp{a) to denote those elements of 
Gp with first coordinate congruent to a(mod p). Also, given a class b G (Z/pZ)'* 
we write Qp{h) for those elements of Qp congruent to b(mod p). By the pigeonhole 
principle and the fact that by construction of P' it is | [Gp{a)]p \ < Bp'^~'^~^ it 
follows that we may find some bi e [5p(a)]p ^ (Z/p'E)'^ with 

|t/p(bi)| > IGpiaWiBp'^-^-'). 

Consider now the set Bi C Qp{a) defined by 

^1^= U 



(4.3) 



s:[sl„=bi 



that is, Bi is the union of those sections {Gp)x in Gp containing a representative of 
bi. 

Since each {Gp)x is a (C, d — h — l)-generic set, we have that 



> 



C 



and therefore 



-h-l 



\>3i\ > 



l(^p).(bi)| 



1 



C 



|^p(bi)| > ^\Gp{a)\ 



(4.4) 



Notice now that since |^yp(a)| > \Bi\ and |[^p(a)]p| < Bp"^ ^, by the first in- 
equality of (|4.4p and the pigeonhole principle we may find another residue class 
b2 e [Gp{a)]p with 



|gp(b2)| > jgprf_;,_i |gpW\gp(bl)| 



> 



Bp 



1 - 



c 



p 



4-h-l 



\Gpia)\, 



which is at least \Gp{a)\/ {2Bp'^^''^^^) ii p'^^'^^^ > 2C. In such a case, if we now 
define B2 as in (I4.3p . but this time with respect to b2, the same reasoning that gives 
55^ I ^p (a) I- Iterating this process we end up with a sequence 

-] , satisfying 



implies IS2I > 

{bi, . . . ,hq} of residue classes, q = \ ^ 

1 



\Qp{^j)\ > 



> 



> 



Bpd-h-l 
1 

\Qp{a)\ 
2Qpd-h-i ■ 



i-i 

Qp{a) \ U 

1=1 



1 



P' 



4- h-l 



\Gp{a)\ 



and \Bj\ > 25cI^p(^)I' particular, we have that 



J = l 



-1 1 > — ^ 
^' - 2BC 



\Gp{a)\ 



(4.5) 
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Now, we consider the set 



B[a] Legp(a) > 



ABC 

j=i 

Notice that B[a\x := B[a\ n 7rj~^(j:) equals {Qp)x whenever this intersection is not 
empty. Also, (j4.5p implies 

I^HI > ^I^P(«)I- (4-6) 

We see that B[a\ is very close to what we want, since if we take any nonempty 
section B\a\x of this set, then there are at least \Qp{a)\/ {ABCY elements s &Q{a) 
such that s = ?/(mod p) for some y S /B[a]j;. 

We now let TZ C [7ri(S')]p consist of those residue classes a e Z/pZ with |t/p(a)| > 
^\Gp\ and write 

B[p] -.^ |s e 5' : S';^(3) n B[a] ^ for some a G 7^| . 

In other words, B[p] consists of those sections of 5" intersecting lJ^g^S[a]. In 
particular, since B[p] contains the disjoint union UogK'^H' from (j4.6p and 

the definition of TZ that 

\B[p]\ > ^l^pl > c\S\, 

for some constant c independent of p. 

Recall now that w(P') > clogQ. For an element s € S' write P!. for the set of 
primes p € P' for which s G B[p]. It follows from the above paragraph that for an 
appropriate choice of c the set 

B:={seS' ■.w{P^)>clogQ}, (4.7) 

satisfies \B\ > c\S\. It is easy to check that B is of the desired form. □ 



To conclude the proof of Proposition 12.21 we will show that if an r-polynomial 
vanishes at the sections Bx for 3>r 1 distinct values of x, then it must also vanish at 
a positive proportion of S. To this end, we choose m distinct sections of S" having 
nontrivial intersection with B, where m = Or{l) is to be specified later. Notice 
that by (j4.2p and Lemma I4?l1 this is always possible provided N is sufficiently large. 
Call C — {S'^^ , . . . , S'^^ } this set of sections. Let Pc consist of those primes p for 
which there exists a pair of sections S'^. ^ S'^. in jC with [S'^.]p n [S'^_]p ^ 0. Given 
such a pair of sections the fact that [S'^Jp n [S'^.]p ^ implies in particular that 
Xi = a;j(mod p). Since Xi ^ Xj this implies that the sum of logp over such primes 
is bounded by logiV. Thus, we see that 

^ logp< MlogiV, (4.8) 

and this implies that w{Pc) <Cr log log iV. 
We now consider on S" the function 

p<Q 

Thus, ipc{s) measures the extent to which the residue classes occupied by s have 
a representative in C. If we write Pi to denote the set of primes in Lemma 14.11 
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corresponding to the section S'^ . DB oi B, it follows from this lemma and (|4.8|) that 

m 

i^cis) > X! X] 51 l3xeSi^:s=x(mod p) logP 
s6S' i=l p^p-\p^ seS' 

els'! 

> m|5| (clogg + 0,(loglogiV)) 

> comlS"! logQ, 

for some cq > and sufficiently large N. 

Set (5 ~ ^ and suppose there are at most d\S\ elements s £ S' with i'cis) > 
SrlogTV. Since ipc{s) < mlogN for every s ^ C we conclude that 

com\S\\ogQ < |/:|2Q+ |S'|3rlogiV + (5IS'|TOlogiV, 

where we used that J2p<Q logP ^ 2(5 for large Q. Hence, by (|4.2I) we derive that 



< 3r. 



2d \ogN^ 

Taking m = 7r/5 we get a contradiction for sufficiently large N. We may therefore 
assume that the set 

A:^{seS' : ipcis) > SrlogiV} , 
has size |yl| > 6\S\ for the above choices of m and 6. 

We will now show that if an r-polynomial vanishes at C, then it also vanishes at 
A. Indeed, let / be such a polynomial and let x £ A be arbitrary. By definition, we 
have |/(a;)| < N^"^. On the other hand, if p is a prime for which there exists some 
y & C with X = y(mod p), then the fact that f{y) — implies that p\f{x). But by 
definition of A the product of all such p is at least N^"^ so we see that the only way 
for this to hold is to have f{x) = 0, which proves our claim. 

By the induction hypothesis and our construction of S' we know that for each 
5^. £ C we may find a (r, l)-characteristic set of size 0{r'^^^^^). Taking the union 
of these m sets we have thus found a set of size 0{r'^~^) which is (r, ^)-characteristic 
for S, with 5 as above. This concludes the proof of ProDOsition l2.2l 

5. Ill-distributed sets with no algebraic structure 

In this section we provide some examples of high dimensional ill-distributed sets 
possessing no algebraic structure. In particular, we show that the assertion of 
Theorem 11.11 fails when e = 0. To begin with, we use a slight modification of the 
construction given in 11, §4.3] to see that, given any < ry < 1, one may construct 
a subset of [N] of size (logA^)'' which occupies at most p^ residue classes for 
every prime p and which possesses no algebraic structure. Indeed, if N is sufficiently 
large, we may find some integer Q with Q < \ogN < 2Q such that the product of 
all primes p < Q, say i?, satisfies N'^/^ < R < N (this, of course, is very crude). 
For each prime p < Q choose residue classes. Then, by the Chinese remainder 
theorem, there are ^ elements below R belonging to a selected class for every 
p < Q. Choose [(logiV)''/2j of these elements and call this set X. Notice that for 
all primes p > Q we have p^' > \X\ and therefore X occupies at most p^ residue 
classes for these primes p. Since by construction it also occupies that many classes 
for all primes p < Q, we get the claim. 

We now proceed to give some examples of ill-distributed sets with no algebraic 
structure. The first one already shows that Theorem 1 1.1 1 is best possible. 
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Example 5.1. This follows readily from the above construction. Fix some pair of 
positive integers d, h with d> h + 1 and consider h+1 different sets Xi, . . . , X/i+i 
constructed as in the previous paragraph with rj = l/{h + 1). If we define the set 

S -.^ {{xi,...,Xd) e [N]'' : e X, VI < i < /i + 1} , 

then we have that \S\ 3> logiV while \[S]p\ < p'^~^ for every prime p, from 

where it follows that we cannot take e = in Theorem ll.il 

Example 5.2. One can generalize the above example by "perturbing" arbitrary 
algebraic sets. We show a simple instance of this. Let d ~ 3 and consider two 
polynomials f,g & 'I\x\. Let X and Y be sets of size ^ (logiV)^/^ occupying at 
most p^l'^ residue classes for every prime p. Then, we see that 

{{x,]{x)-X,g{x)-Y):x^{NW 

is a big set of integer points occupying at most p^ residue classes. 

Finally, we show that not all counterexamples are perturbations of strongly al- 
gebraic sets. 

Example 5.3. Fix some small e > 0. By the Chinese remainder theorem one can 
construct a set X C [A^] of size \X\ ^ N^~^ occupying only one residue class for 
every prime p < elogiV. Take K — [(e log iV)-^/^J and let fi, . . . , fx , gi, ■ ■ ■ , be 
a family of polynomials. Also, let Xi, . . . , Xk, Yi, . . . , Yk be arbitrary sets of size 
at most (e log TV) 1/3. Then 

K 

[j{{x,Mx)-X,,g,{x)-Yi):xeX} 

i=l 

is a big set of integer points occupying at most p^ residue classes for every prime 
p. Notice that this construction is of a different nature than the one given in 
Example 1 5. 21 since the union of that many algebraic sets hardly retains any algebraic 
structure itself. 

It follows from the above examples that strange things can happen if one allows 
the set to possess too many very small sections. However, we shall show in Theorem 
16.11 below that the methods of this paper do indeed work as long as one avoids this 
type of situations. 

6. Further results and conjectures 

6.1. A generalization of Theorem 1.1. We now state the most general result 
which follows at once from the methods of this paper. To do this we need the 
concept of a (fc, 77, e, p)-regular set. When fc = 1 we just take this to mean \S\ > N'^ . 
Recursively, for any positive integer fc, we say S C [N]"^' is (fc, ry, e, p)-regular if for 
any S" C 5 of size |S"| > 77|5| we can find a subset S" C S' , \S"\ > p\S\, satisfying 

\TTiiA)\ > for every A C S" with \A\ > \S"\/2 

for some 1 < z < d and such that S" O t:^^{x), i as before, is either empty or 
(A; — 1 , 1 , e, p)-regular for every x £ [N] . Although this definition seems complicated, 
one would expect a reasonable set to satisfy it for an appropriate choice of the 
parameters. As an example for which we shall give an application below, we notice 
that given any function / : [N]'' — > [iV]* with k,r,t arbitrary positive integers, the 
graph of / is a (fc, rj, l/2r, l/2)-regular set provided N is large in terms of rj. This 
definition was chosen so that it satisfies all the assumptions made during the proof 
of Theorem ll.il Therefore, we immediately get the following generalization of this 
result. 
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Theorem 6.1. Let k > be some integer and r],e,a,p > real numbers. Then 
there exists C = Ok,ri,e,a,pi^) such that for any (k,r],e, p)- regular set S occu- 
pying less than a.p^ residue classes for every prime p there exists a polynomial 
f S Z[a;i, . . . ,Xd] of degree at most C and coefficients bounded by N'-^ , such that f 
vanishes at more than (1 — 77)|S'| points of S. 

It is important to note that one cannot hope to do much better than Theorem 
16.11 in this generahty, since the regularity conditions are necessary in order to avoid 
those constructions emerging from the Chinese Remainder Theorem as in fj^l 

6.2. Approximate reduction. We shall give a quick application of Theorem 16. II 
to the study of functions preserving some structure when reduced modulo a prime, 
that is, functions / for which knowing the class of a;(mod p) gives us information 
about the class of /(a;)(mod p). Thus, given a positive integer K, we say a function 

/ : [N]'' [N^f has X-approximate reduction if [/ ([A^]'^(a))]^ < K for every 

a G (TLjpl})^ and every prime p. When K = \ this implies the very strong property 
of recurrence mod p and using this, it was shown by Hall [5] and Ruzsa [15] (see also 
[TOl §XV.41]) that for large N the only functions having 1-approximate reduction 
are polynomials (notice that we are assuming our functions to have polynomial 
growth, which is in fact a necessary condition 9 ) . It follows from Theorem 16.11 
that this is indeed a very robust phenomenon: 

Corollary 6.2. Suppose f : \NY — >■ [-/V^]' has K -approximate reduction and let 
r(/) be the graph of f. Then there exists C = Ofc,r,t,K(l) cind a polynomial P S 
Z[xi, . . . , xj] of degree at most C and coefficients bounded by N'^ , such that P 
vanishes at more than (1 — ?7)|r(/)| points ofT(f). 

6.3. The Inverse Sieve Problem for d = 1. We conclude by mentioning a very 
strong version of the inverse sieve problem which is conjectured to hold for sets 
S C [N] (see [S Problem 7.2] and (n]). 

Conjecture 6.3. Suppose S C [N] is some set of integers of size \S\ > N'^ occu- 
pying less than ap residue classes for some < a < 1 and every prime p. Then 
most of S is contained in the image of an integer polynomial of degree bounded in 
terms of a and e. 

As a more precise instance of this, they conjecture for example that if a set 
S has size \S\ > N^-'^^ say, and occupies less than 2p/3 residue classes mod 
p for every prime p, then most of S must be contained in a set of the form 
{an^ + 6n + c : n G Z}. This can be seen as an inverse conjecture for the large 
sieve [51 Hi]. 



Conjecture 16.31 seems to be hard. For example, it was shown by Green that if 
the residue classes occupied by S lie outside some interval of length [p — l)/2 then 
IS*! Afi/3+^ for any e > 0. However, even in this particular case, to get a bound 
of the form \S\ A^, it seems necessary to appeal to very deep conjectures of 
analytic number theory like the exponent pair conjecture (see [5]). Furthermore, as 
noted by Helfgott and Venkatesh [Tl] §4.2], Conjecture 16.31 implies that there are 
•Ce points on an irrational curve, which is itself a well known open problem. 
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